Graph-based Semi-supervised Gene Mention Tagging
نویسندگان
چکیده
The rapidly growing biomedical literature has been a challenging target for natural language processing algorithms. One of the tasks these algorithms focus on is called named entity recognition (NER), often employed to tag gene mentions. Here we describe a new approach for this task, an approach that uses graphbased semi-supervised learning to train a Conditional Random Field (CRF) model. Benchmarking it on the BioCreative II Gene Mention tagging task, we achieved statistically significant improvements in Fmeasure over BANNER, a widely used biomedical NER system. We note that our tool is transductive and modular in nature, and can be integrated with other CRF-based supervised NER tools.
منابع مشابه
Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models
We describe a new scalable algorithm for semi-supervised training of conditional random fields (CRF) and its application to partof-speech (POS) tagging. The algorithm uses a similarity graph to encourage similar ngrams to have similar POS tags. We demonstrate the efficacy of our approach on a domain adaptation task, where we assume that we have access to large amounts of unlabeled data from the...
متن کاملScientific Information Extraction with Semi-supervised Neural Tagging
This paper addresses the problem of extracting keyphrases from scientific articles and categorizing them as corresponding to a task, process, or material. We cast the problem as sequence tagging and introduce semi-supervised methods to a neural tagging model, which builds on recent advances in named entity recognition. Since annotated training data is scarce in this domain, we introduce a graph...
متن کاملGraph-based Semi-Supervised Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging
This paper introduces a graph-based semisupervised joint model of Chinese word segmentation and part-of-speech tagging. The proposed approach is based on a graph-based label propagation technique. One constructs a nearest-neighbor similarity graph over all trigrams of labeled and unlabeled data for propagating syntactic information, i.e., label distributions. The derived label distributions are...
متن کاملSemi-Supervised Learning of Sequence Models with Method of Moments
We propose a fast and scalable method for semi-supervised learning of sequence models, based on anchor words and moment matching. Our method can handle hidden Markov models with feature-based log-linear emissions. Unlike other semi-supervised methods, no decoding passes are necessary on the unlabeled data and no graph needs to be constructed— only one pass is necessary to collect moment statist...
متن کاملSemi-Supervised Learning of Sequence Models with the Method of Moments
We propose a fast and scalable method for semi-supervised learning of sequence models, based on anchor words and moment matching. Our method can handle hidden Markov models with feature-based log-linear emissions. Unlike other semi-supervised methods, no decoding passes are necessary on the unlabeled data and no graph needs to be constructed— only one pass is necessary to collect moment statist...
متن کامل